Automatic Extraction of Japanese Grammar from a Bracketed Corpus

نویسنده

  • Kiyoaki Shirai
چکیده

In recent years, numerous attempts have been devoted to derive Context Free Grammars (CFGs) by using a large corpus. In this paper, we describe a method to extract a Probabilistic Context Free Grammar (PCFG) of Japanese from a bracketed corpus, and propose two methods to improve it. The experiments show that the extracted PCFG has a 94 % accept rate, 85 % brackets recall and 75 % brackets precision.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Automatic Grammar Acquisition from a Bracketed Corpus

1 I n t r o d u c t i o n Designing and refining a natural language grammar is a diiBcult and time-consuming task and requires a large amount of skilled effort. A hand-crafted grammar is usually not completely satisfactory and frequently fails to cover many unseen sentences. Automatic acquisition of grammars is a solution to this problem. Recently, with the increasing availability of large, mac...

متن کامل

Automatic Detection of Syllable Boundaries Combining the Advantages of Treebank and Bracketed Corpora Training

An approach to automatic detection of syllable boundaries is presented. We demonstrate the use of several manually constructed grammars trained with a novel algorithm combining the advantages of treebank and bracketed corpora training. We investigate the effect of the training corpus size on the performance of our system. The evaluation shows that a hand-written grammar performs better on findi...

متن کامل

Automatic Extraction of Grammars From Annotated Text

The primary objective of this project is to develop a robust, high-performance parser for English by automatically extracting a grammar from an annotated corpus of bracketed sentences, called the q~eeebank. The project is a collaboration between the IBM Continuous Speech Recognition Group and the University of Pennsylvania Department of Computer Sciences t. Our initial focus is the domain of co...

متن کامل

Statistical Parsing with a Grammar Acquired from a Bracketed Corpus Based on Clustering Analysis

This paper proposes a new method for learning a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis and describes a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. By grouping brackets in a corpus into a number of similar bracket groups based on ...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995